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C$ • Abstract 
1-^ ■ 

^f-^ ' We develop the argument that the Gibbs-von Neumann entropy is the appropriate statistical 

£NJ , mechanical generalisation of the thermodynamic entropy, for macroscopic and microscopic sys- 

tems, whether in thermal equilibrium or not, as a consequence of Hamiltonian dynamics. The 
mathematical treatment utilises well known results |Gib02[ ITol38[ IWeh781 IPar89j . but most 
importantly, incorporates a variety of arguments on the phenomenological properties of thermal 
states [Szi2%l |TQ63[ IHK651 IGB91] and of statistical distributions [HG761 IP W781 lLen78] . This 
enables the identification of the canonical distribution as the unique representation of ther- 
mal states without approximation or presupposing the existence of an entropy function. The 
Gibbs-von Neumann entropy is then derived, from arguments based solely on the addition of 
probabilities to Hamiltonian dynamics. 
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1 Introduction 



The laws of statistical mechanics apply to conservative systems of any number of degrees 
of freedom, and are exact. [Gib02] 

The statistical mechanics considered by Gibbs, in his classic treatise of 1902, is a more general 
structure than thermodynamics. It applies to any kind of Hamiltonian system in which probabilistic 
reasoning is valid. Useful properties may be derived without any reference to thermodynamic 
quantities, and can be used without any consideration of whether or not one is dealing with a 
thermal system. 

Nevertheless, thermal systems exist, are important and it is necessary that statistical mechanics 
gives an account of them, including the phenomena usually described by thermodynamics. Making 
this connection is surprisingly hard, without introducing "question begging" assumptions. Gibbs 
tentatively attempted to make this connection, in [Gib 02] [Chapter XIV], but only referred to ther- 
modynamic "analogies" . 

Criticisms of the Gibbs approach, and the Gibbs entropy as a thermodynamic entropy are not 
hard to find these davs |Cal991 IShe991 IGolOll lAlbOlj and go back at least as far as[EE12j. The 
principal purpose of this paper, is to argue that the Gibbs- von Neumann entrop}0 can be derived, 
from physical arguments, without problematical assumptions, as precisely what one should desire 
for a statistical generalisation of thermodynamic entropy. 

The method will not be to directly attempt to find statistical mechanical term to act as a 
thermodynamic entropy. There is no uncontroversial definition of entropy, outside of classical 
phenomenological equilibrium thermodynamics, but the world is not in equilibrium and statistical 
fluctuations occur. Instead the approach will be to develop statistical mechanics as a broader 
subject than thermal physics. When trying to apply statistical mechanics to thermal phenomena 
we will consider some basic physical properties of thermal states and then apply those properties 
to statistical mechanics. 

Of central importance, and what differs most from more traditional treatments, will be the 
justification for the derivation of the canonically distributed density matrix as the unique sta- 
tistical distribution that can represent thermal states. Here the key arguments will be Szilard's 
1925 derivation of the canonical distribution [Szi25] from phenomenological grounds, the concept 
of passive distributions [P W78| ILen781 ISew80| and their relationship to the adiabatic availability 
of energy [HK65, IHG761 1GB91] . It is an interesting feature of these, that it does not depend upon 
whether the system is large or small or in thermal equilibrium or not. It applies to any situation 
in which the use of probability distributions are valid. 

We will first explore the mathematical structure of Gibbs statistical mechanics, as applied to 
quantum mechanics. The mathematics will, in a large part be recognisable[Gib02, Tol38l IWeh78[ 
Par89], but the emphasis will be to show what can (and cannot) be derived without making physical 
assumptions. The section will, of necessity, appear rather abstract and unmotivated. 

The structure so developed produces equations very like those that occur in thermodynamics. 
Actually connecting these equations with thermodynamics requires a logical jump which appears 
to assume precisely the thing which one seeks to justify. We will briefly review why this is so and 
some of the attempts to make this jump. 

We then return to the physical basis of the statistical approach. The statistical approach applies 
whenever it is meaningful to use probabilities. Deciding when this is so is not uncontroversial, but 
we will not address that problem here. Instead we will explore what the consequences are when 
a probabilistic description is meaningful. A remarkably large amount of the familiar structure 

1 We will be working with quantum mechanical systems, so will derive the von Neumann entropy. 
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of statistical mechanics can be derived without any reference to thermal concepts or notions of 
entropy. Particular attention will be drawn to the closely related concepts of adiabatic availability 
and passive distributions. 

Only after we have derived the general structure of quantum statistical mechanics will we 
consider thermal systems. To examine what are the statistical mechanics of thermal systems, we 
first need to identify what we mean by a thermal system. We identify four physical properties, which 
we suggest are observed properties of thermal states. These properties uniquely select the canonical 
probability distribution. If this argument is accepted, we then proceed in well established steps to 
develop thermal heat baths, the temperature scale and finally the form of a statistical mechanical 
entropy from physically motivated arguments. 

These physical arguments are valid for systems of any size, for non-equilibrium systems as well 
as for equilibrium systems, indeed for any situation where the use of a probability distribution is 
physically justified. 



2 Mathematical formalism 

We establish the properties of a particular type of function, which for want of a better word we shall 
call a distribution, on the state space of a system that has a Hamiltonian evolution. No physical 
interpretation is placed upon either this type of function nor of the derived properties. The object 
is to establish exactly which purely mathematical properties can be defined without needing to 
introduce physical justifications. 

This section will, perhaps, seem needlessly abstract and physically unmotivated. This is quite 
correct! We develop the mathematics first to ensure that no physical assumptions have been used in 
their derivation. This is try to avoid circular reasoning when we come to consider the appropriate 
descriptions of physical processes. 

When we do start to identify physical processes with mathematical structures, we wish to be 
clear which properties legitimate that identification, which properties then follow directly from that 
identification, and which properties require further assumptions or justification. Readers who are 
prepared to take this on trust may jump directly to Section |4] where we will start considering the 
properties of physical systems. 



2.1 Distributions 

The quantum mechanical state space is a Hilbert space II and has a Hamiltonian evolution operator, 
H(t). For the purposes of this paper, a distribution on the state space is an operator Q on the 
state space, with orthonormal eigenstates and real eigenvalues uip, for which: 

p 

up > (2) 

= 1 ( 3 ) 

p 

it™ = [H ,nu (4) 

Both the eigenstates and eigenvalues may be evolving in time due to H . We may also write the 
time evolution of the distribution in the unitary form: 

n(t - 1 ) = un(t )tf (5) 
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where U is the solution of the operator equation 

or in the more general superoperator form: 

n(t - t ) = L(t - t ) [Q(t )] (7) 

We will refer to the combination of a state space IT, Hamiltonian evolution H on the state space, 
and distribution fi over the state space as a system. 

2.1.1 Subdistributions 

A sub distribution is the normalised portion of a distribution that is non-zero over a restricted region 
R C II of the state space: 

fi,= ^^L|/?)(|3| (8) 

Two subdistributions are non-overlapping if there is no region of the state space for which they are 
both non-zero: 

SliSlj = M^) 2 (9) 
A distribution may be decomposed into non-overlapping subdistributions: 

n = J>A (io) 

i 

It will be useful to do this by constructing a complete set of non-overlapping projectors, Ki, such 
that 

KiKj = (11) 

= 1 ( 12 ) 

i 

constructed from the eigenstates of the distribution: 

n = 5> a |a)(a| (13) 

a 

{a | a') = <W (14) 
Ki = E |«) ("I ( 15 ) 

aCi 

^ = TWK^I (16) 
Wi = Tr [KiQKi] (17) 



2.1.2 Subspaces 

When the Hilbert space can be separated into a product of two subspaces II = IIi ® II2 we form 
the marginal distributions 

Th = Tr 2 [n] (18) 
TT 2 = Tn[fi] (19) 

The marginal distributions do not generally evolve by a Hamiltonian evolution, but the evolution 
may still be expressed by a superoperator equation: 



fii(t) = Li fii(0) (20) 
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2.2 Operators 

Given an operator A(t), on the state space, we may define the value of that operator for the 
distribution Cl(t) by: 

(A(t)) m = Tr[A(t)n(t)} (21) 
Given the Hamiltonian evolution operator H(t), for the state space, then 

If the system is not isolated, then it is a subsystem 111 of a larger space IT = IIi ® II2. The 
Hamiltonian may be rewritten as the sum of three terms: 

1. A term operating solely upon subsystem 1, H\(t)\ 

2. A term operating solely upon subsystem 2, H2(t); 

3. and a term operating jointly as an interaction between the two systems Vu{t). 

H(t) = H^t) ® I 2 + h ® H 2 {t) + Vi 2 (*) (23) 



Now the evolution of the marginal distribution fii(t) = Tr 2 will not, in general, be describable 

by a Hamilton evolution operator. 

If we take an operator Ai(t) that acts solely upon the space of the subsystem n l5 we find: 

d(A 1 (t)(^I 2 ) m _. ti /dA 1 (t)\ 

~i(t) 



Unless L4i(i) <S> h, Vi2(t)]_ = we appear to have a dependancy upon the full distribution Q(t). 
To eliminate this we express the evolution of the marginal distribution through the evolution of it's 
eigenstates and eigenvalues: 

Sh(j) = 52u a (t)\a(t))(a(t)\ (25) 

a 

The set of eigenstates {|a:(t))} will always be a basis for the subspace, so there exists an unitary 
operator T(t) for which 

\a(t)) = T(t) |a(0)) 
and whose evolution is generated by a Hamiltonian operator 0(i): 

ih ^r = @{t)T{t) 

This gives 

which is expressed purely in terms of operators upon, and a distribution over, the subspace. 

Note that if any of the commutators L4i(i), 0(i)]_, Vt\(t),A\{t) or Q(t),£li(t) are zero, 
the third term disappears to give: 

d ^))^m / dA 1 (t) \ ^ rm g*^ ^ 

— a — = \~ar /niw + ? <a(t) 1 |a(t)) -«r (27) 



6 



The following two terms will be useful later on: 



AAi(t) 



I 



o dt 
* idA x {t) 



dt 



dt 



fii(t) 



(28) 
(29) 



2.3 Gibbs-von Neumann measure 

We now introduce the Gibbs-von Neumann measure of a distribution: 

G[ft] = Trfftln [ft]] 



(30) 



There may be the perception that we have introduced a "question begging" step, as to why we 
introduce this particular measure. We suggest that this is not the case, as we have made no physical 
interpretation of this measure. We introduce it simply to establish some of its mathematical 
properties, devoid of any interpretation. 

2.3.1 Concavity 

The Gibbs-von Neumann measure is a concave function and this has the property, that given any 
two distributions ft and ft', then 



Tr [ft (In [ft] -In [ft'])] > 



(31) 



2.3.2 Subspaces 



When a space can be separated into two subspaces II = 111 <g) II2 , we can define a measure of the 
correlation of the distribution between the subspaces as: 



C [ft] = G 



ft 1 



+ G 



ftv 



G [ft] > 



(32) 



Equality occurs if, and only if, the systems are uncorrelated ft = fti ® ft 2 - 

This has a direct consequence for the evolutions of initially uncorrelated systems that are allowed 
to interact. If the systems are uncorrelated at t = 0, so that ft(0) = fti(0) <g> ft2(0) but allowed to 
interact after that point, then for all £ > 



G 



fti(0) 



+ G 



ft 2 (0) 



> G 



fti(t) 



+ G 



ft 2 (£) 



(33) 



with equality occurring at time t if, and only if, ft(£) = fti(£) <8> ft 2 (£) 
2.4 Canonical distribution 

The extremal of G [ft] for a fixed value of (H) n = E is given by the canonical distribution: 

e -/3(E)H 



ft(^) 



Tr [e-^ E ) H ] 



(34) 



where (3(E) is a parameter depending only on the Hamiltonian H and the fixed value E. 

We will assume that the extremal value is always the minimal value, although this is a far from 
trivial assumption. 
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2.4.1 Subspaces 



If there is no interaction term between subspaces of a canonically distributed system, the marginal 
distributions over the subspaces are canonically distributed with the same (3 parameter. 



H = Hi®I 2 + h 



H2 

e -f3H 2 



Tr [e~^} Tr [e~^\ 



(35) 
(36) 



2.4.2 Minimising G + (H) 

Given a state space, a Hamiltonian H, the canonical distribution for that Hamiltonian and 
any other distribution f2' over that space, then: 



G[nT+^(%>G[#|+/3(%« 



(37) 



The canonical distribution not only minimises G for a fixed value of (H) but also minimises G+[5 (H) 
for a fixed value of (5. 

The result can be rearranged to give 



g [n'] - g \n^] > -p((H) a , - (H) nW ) 



(38) 



2.4.3 Interactions with arbitrary distributions 



Interactions between a canonically distributed system fi^ and an arbitrarily distributed system 

They are initially (t = 0) non- interacting H = H\ ® I2 + h <8> #2 and uncorrelated £7(0) = 
fi^(O) <S> ^2(0). The systems are allowed to interact, H' = H + V12, for a finite period of time, but 
so that at the end of the interaction (H) n (t) = (H) n (0). 

It can then be shown that 



G 



«2(0)j + P (Hi)^ > G [n 2 (t)\ + (3 (H^ 



2.4.4 Interactions between canonical distributions 



(39) 



Interactions between two canonically distributed systems, but with different (3 parameters, Q.{ 
and d£ 2 \ 

They are initially (t = 0) non- interacting H = Hi <g> I2 + I\ <S> H2 and uncorrelated £7(0) = 
n { f l] (0) <g> ^f 2) (0). The systems are allowed to interact, H' = H + V12, for a finite period of time, 
outside which (^12)^^ = (^12)^(0) = 0> but at the end of the interaction (H) n ^ = (H)^ y 

It can be shown that 

/?i((^} W -(^)n7(o))+^((^2) 

Using the notation: 



n 2 (t) 



(H2) 



n 2 (o) 



> 



this becomes 



AH = AHi + Aff 2 = 
Affi (/?i - ft) > 



(40) 

(41) 
(42) 
(43) 

(44) 
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2.5 Large uncorrelated canonical assemblies 

We will now consider a particular type of system called a Large Uncorrelated Canonical Assembly. 



• The system is large, in that it has a very large number of degrees of freedom. 

• The distribution over the state space is uncorrelated with any other system. 

• The distribution over the state space is a canonical distribution, with parameter (3. 

• The system is an assembly [Per93j. It consists of a very large number of identical subsystems, 
with no interactions between the subsystems. 

As the overall distribution is canonical, and there are no interactions between subsystems, the 
subsystems have canonical distributions with the same parameter /3 and will not be correlated with 
each other. 

2.5.1 Interactions with arbitrary distributions 

When another system interacts with a LUCA system, the interaction will always be in a particular 
way. The interacting system will have a succession of brief interactions with successive subsystems 
of the LUCA, such that no subsystem of the LUCA is ever encountered twice. 

As each interaction with a subsystem is an interaction with a canonical system with parameter 
/?, by Equation [39] the value of G 0,2 + P i^^) n 2 (t) ^ or ^ e interacting system will increase on each 
interaction. If there is no further barrier to prevent it, this value will approach it's maximum. From 
the results of Section \37\ the distribution which maximises this is the canonical distribution over 
H2 with the parameter (3. 

2.5.2 Interactions with canonical distributions 

Now consider an interaction between a LUCA system and a system already canonically distributed 
with the same parameter (3. 

Consider a very slow variation in the Hamiltonian of the system from -ffi(O) to H\{t). We 
might suppose that we proceed in a series of small steps. First isolate the system and make a 
very small change in it's Hamiltonian, sufficiently slowly that the quantum mechanical adiabatic 
theorem applies [Mes62| [Ch 17]. As an isolated system the distribution will move slightly away from 
canonical. Then bring it back into contact with the LUCA, and the distribution will be restored 
to a canonical distribution with parameter (3. 

As the steps become infinitesimal, the system remains in the canonical distribution, but now it 
is a time varying canonical distribution: 

^) = Tr[e^)] (45) 
As this is always diagonalised in the -ffi(i) basis, we have 

n^j /dH!(t)\ t ^ ^ ^ d ( e-P Ea( $ 



dt \ dt /n^'V Ea{t) dt{Tr[e-^)]J 

where E a (t) is the instantaneous energy eigenvalue of the instantaneous eigenstate \E a (t)) of Hi(t). 



(46) 
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Adding the identity 



d 



PTr [ e -^iW] dt 
the last term can be rearranged 

sr E(t) d ( e- pE " (t) \ _ 

^ a[ } 8t 1 Tr[e-^W] ) ~ 



Tr 



-PHi{t) 



+ 



Tr [ e -^i(*)] 



(47) 



P V F ( f \ r -0E a (t) dE »(t) 

Tr[e-^iW]V ^ 

1 



Tr [ e -^iW]' ^ 
1 



-0ffi(t) 



/3Tr [ e -^iW] 5t 



Tr [ e -^i(*) 



^ I Tr [ e -/Wi(*)] + /? 



+ 



Tr 



Tr [e-^i«)] 



i(*)l S' 



-/8S«(t) 



dt 



-/3#i(t) 



/3<9t^ I Tr [e-^iW] 



e -f3E a (t) e -/3E a {t) 

(3E a (t) + _ r ^ In 



Tr [e-^iW] 



Tr 
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E 



-PE a (t) 



(3 dt ^ Tr [e-^W] 
Integrating Equation [J6] from to t then gives: 



In 



-/3E a (t) 



Tr [e-^iW] 



1_ d_ 

"/3dt 



G 



af\t) (48) 



D [H x (t)\ = Affi(t) + -j-AG [n^ 



(49) 



3 Not thermodynamics 

The properties we have considered here are simply mathematical properties of the Hamiltonian 
evolution of distributions over state spaces. They will apply to any function that has the properties 
of being a distribution. No physical interpretation has been placed upon them, and no physical 
interpretation should be placed upon them unless it can be justified that the property concerned 
does, in fact, correspond to a physical property of interest. 

We have introduced a particular measure, the Gibbs-von Neumann measure, which proves to 
have certain properties. We have also identified a particular distribution, which is uniquely selected 
by that measure. We have identified a particular type of system, which has that unique distribution. 
The resulting description produces equations that closely resemble thermodynamics. It is tempting 
to identify G with the negative of entropy and see Equation [33] as representing the Second Law 
of Thermodynamics. It is tempting to identify the canonical distribution as representing thermal 
equilibrium, as it is the state that maximises —G, to identify j3 as the reciprocal of temperature, 
on the basis of Equation [44l to identify an environmental heat baths as a LUCA on the basis of 
Section [2.5,11 and Equation [39] as the isothermal equation AW = AE — TAS. But how justified is 
all this? 

There are two problems. The first is why should one suppose —G represents entropy or canonical 
distributions represent thermal equilibrium? The second is whether it is even valid to identify the 
thermodynamic entropy with a measure upon a distribution. 
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3.1 Why these distributions, these measures? 

The first problem can be seen earliest in works such as [Gib02j [Chapter XIV], where different 
distributions are discussed, which may also appear to reproduce thermodynamic results. Gibbs 
cautiously refers only to thermodynamic "analogies" in statistical mechanics (a practice echoed in 
|Tol38] . amongst others). 

In his review [Pen79| . Penrose shows the question remains: 

what is the physical significance of a Gibbs ensemble? How can we justify the standard 
ensembles used in equilibrium theory? 

Let us consider the mathematical structure of the previous section. The properties derived are 
almost entirely consequences of two things: 

1. The function G is a concave function of distributions (Section 12.3. ip : 

2. The sum of the marginal values of G for two uncorrelated distributions is greater that the 
sum of the marginal values of G for two correlated distributions (Section I2.3.2|) . 

If we are tempted to identify G as entropy and (5 as temperature on the basis of the relation- 
ships derived, wouldn't any non-decreasing, concave function, with the appropriate property for 
uncorrelated distributions^, be able to do the job? 

In recent years it has also been argued that, in quantum mechanics, it has simply been assumed 
that the von Neumann entropy is the appropriate one, and that the only justification offered for it 
is flawed: 

The convention first appears in Von Neumann's Mathematical Foundations of Quantum 
Mechanics. The argument given there to justify this convention is the only one hitherto 
offered. All the arguments in the field refer to it at one point or another. Here this 
argument is shown to be invalid. [She99] 

If we assume that the canonical distribution is appropriate for thermal equilibrium, we may 
reasonably represent an ideal heat bath by a LUCA, and from this (for large thermal systems, 
at least) it is possible to show that the von Neumann entropy correctly gives the value of the 
thermodynamic entropy. But what is the justification for using the canonical distribution, except 
that it maximises the von Neumann entropy? 

If we start by identifying a LUCA as an ideal heat bath, we can show that thermalisation corre- 
sponds to approaching the canonical distribution and so, perhaps, justify the canonical distribution 
as appropriate for thermalisation. But why assume that an ideal heat bath is a LUCA? A LUCA is 
canonically distributed already, so assuming that it represents an environment at some temperature 
is tantamount to assuming the very thing we would wish to demonstrate. 

If we assume that the von Neumann entropy is the thermodynamic entropy, then maximising 
it produces the canonical distribution. This may justify the canonical distribution as thermal equi- 
librium and hence LUCA's as ideal heat baths. But, without assuming the canonical distribution 
is thermal equilibrium in the first place, what reason do we have for believing the von Neumann 
entropy is thermodynamic entropy? 

Although we appear to have arrived at expressions that are analogous to thermodynamic ex- 
pressions, we cannot identify these expressions with thermodynamic processes unless we can be 
sure that they really are the appropriate representation of the physical process. There appears to 
be a logical gap. 

technically this property can be obtained for G from (a) concavity; and (b) the value of G being additive for 
uncorrelated distributions. It is only G that has this property. However, this additivity is not necessary for the 
derived property to hold, so the derived properties may still hold for other, non-additive, concave functions. 
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3.1.1 Why the canonical distribution? 

There has been a large literature devoted to deriving the canonical distribution. Attempts in the 
literature to justify the canonical distribution are largely to do with the problem of explaining 
the approach to thermal equilibrium. It is assumed that the canonical distribution is thermal 
equilibrium (and an entropy usually has already been decided upon), and the attempt is to explain 
why systems are in thermal equilibrium. While this is not the same as our concern here, it will 
be useful to briefly review these attempts. 

The Ehrenfests [EE 12J [Section 25] credit Boltzmann with the first observation which justifies 
the canonical ensemble. The essence of this justification is that if one takes a large system, whose 
distribution is uniform over a constant energy hypersurface (i.e. a microcanonical distribution), 
and one takes a small subsystem of that, then the marginal distribution of the small subsystem 
is canonical. Indeed, with minor variations, this relationship between the microcanonical and 
canonical distributions, is practically the only justification offered in most textbooks. 

The problem then becomes to justify the microcanonical distribution. Some, following Tolman, 
simply make a fundamental assumption of a uniform distribution, with the only justification being, 
in effect, the Principle of Insufficient Reason to argue the inappropriateness any other choice. 

Attempts to justify the uniform distribution on dynamical grounds, argued by the Ehrenfests, 
have led to the development of the Boltzmann's ergodic hypothesis, concepts such as metric tran- 
sitivity, and weak and strong mixing. Although this has generated much interesting mathematics, 
as a justification of the microcanonical distribution for realistic systems, it can only be said to have 
had a mixed degree of success (see [BFK06] for a discussion and defense). 

A recent development [PSW06, GLTZ06J of Boltzmann's original insight, specific to quantum 
mechanics, demonstrates that for large systems in a pure quantum state, the reduced density 
matrix of a small subsystem is very close to being canonically distributed. This appears to produce 
the canonical distribution even without needing a probability distribution over the whole space. 
Unfortunately the result is not true for all pure states, only "almost every" or the "overwhelming 
majority" of such states. The problem here is that these terms are only valid relative to some 
measure over the state space and, as it turns out, that measure is the uniform one. In other words, 
the development shows that it is overwhelmingly probable that the individual subsystem behaves 
as if it is canonically distributed, if we have a uniform probability distribution over the whole state 
space. While this is a stronger result than Boltzmann's, it cannot be said to have less problematical 
assumptions. 

3.1.2 Why the Gibbs-von Neumann measure? 

Once the Gibbs-von Neumann entropy is chosen as physical entropy, it is possible to argue that 
the canonical distribution is appropriate for thermal equilibrium as it maximises the entropy of 
thermally isolated systems. 

Our problem here is why the Gibbs-von Neumann entropy should be used at all. This measure 
can certainly be uniquely identified from a number of information theoretic prescriptions |Sha48[ 
SW49J. But why should such information theoretic concerns should be of any significance for 
thermodynamics? Why should thermal equilibrium have anything to do with maximising our lack 
of knowledge? 

The idea that entropy is something to do with a lack of knowledge or uncertainty is an old one, 
but unless one has already assumed that the measure of entropy is indeed a function of probability 
what is the basis for believing that thermodynamic entropy should have anything to do with 
uncertainty? A priori, what is the property of thermal states that make us think they represent 
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maximal ignorance? Even if this is accepted, there are many measures of ignorance. Why are the 
properties of the Shannon measure of information the ones that identify the function that needs 
maximising? The existence of alternate information measures, such as the Renyi measures, and 
alternate entropy measures, such as the Tsallis entropy [TsaHHl ITsaOO| ITsa03| and others [CB07, 
ICam07| call into question whether assumptions that uniquely specify the Shannon measure can be 
taken for granted. 

3.2 Why distributions? What is entropy anyway? 

The second problem is the complaint of authors such as: 

The Gibbs entropy is not even an entity of the right sort: It is a function of a probability 
distribution, i.e., of an ensemble of systems, and not a function on phase space, a 
function of the actual state X of an individual system [GolOl] 

thermodynamic entropy is patently an attribute of individual systems. And attributes 
of individual systems can patently be nothing other than attributes of the individual 
microconditions. [AlbOlj 

for present purposes - reconciling thermodynamics with mechanics - [Gibbs entropy] is 
of no use since thermodynamic entropy is applicable to individual systems. My coffee 
in the thermos has an objective thermodynamic entropy as a property. [Cal04] 

The literature abounds with alternative definitions of entropy (in a recent work[CS05] [Chapter 
1] 21 different versions of entropy are listed). Perhaps the question should not be "What is the 
correct expression for entropy?" , but "What exactly is 'entropy' supposed to be?" . What is it about 
a particular expression that legitimates referring to it as 'entropy'? What, in short, is 'entropy' 
for? To develop a physical understanding of this, we will let the answer emerge from statistical 
mechanics, rather than be presupposed. 

4 Statistical Mechanics 

After having asked many questions in the previous Section, we will now proceed by ignoring them. 
We will develop statistical mechanics without any reference to entropy at all. The main purpose 
of this is to demonstrate that much of the physical understanding of statistical mechanics may be 
developed without any reference to thermodynamics. 

The basic assumption of this section is that we have a physical system of interest where it is 
valid to talk about a probability of the system being in a particular state. We will not consider 
why such a probabilistic situation has occurred, and will attempt to avoid all discussion of what 
'probability' actually means. Instead we will take for granted that we are dealing with situations 
where statements of the form "There is a probability p(X) that the system is in state X" are 
meaningful, and work through the consequences of this. From now on, when we refer to a system, 
we will mean a physical system, with a state space, a Hamiltonian evolution and probability for 
the system being in any particular state. 

4.1 Operators and evolutions 

To go back to basics, we start by deciding what we can say about the average value of observing an 
observable. Suppose we have an observable A, then the expectation value of the observable, when 
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the system is in state \a n ) is (a n \ A \a n ). If the state \a n ) has probability p(a n ), then expectation 
value of the observable is 

(A) = Y / PM (a n \A\a n ) (50) 



This can be rewritten as 
where 



(A)=Tv[pA] (51) 
^2p(a n ) \a n ) (a n \ (52) 



(Note we have not assumed that the set {(an)} are an orthonormal basis). 

p is the density matrix for the system. All the statistical properties of the system can be 
calculated from the density matrix. As well as mean values we may also calculate variances, 
standard deviations, and indeed all of the standard apparatus of statistics and probability theory. 

We also note that the density matrix fulfils the criteria for a distribution, provided that (for 
isolated systems, at least), if the state \a n ) at time t = 0, has probability p(a n ), the state evolves 
into \a' n ) at a later time t = r, by a Hamiltonian evolution and state \a' n ) has probability p(a n ). 

At the risk of further stating the obvious, let us just remember a few other things. The 
expectation value is not telling us the exact value that we will actually get, nor is it even telling 
us that the value we will actually get is close to this value. We should no more expect this than 
expect that when we roll a die, the face should come up with a number close to three and a half. 

The value we get is an expectation value, because that is the statistical property we have 
chosen to calculate. Statistics is certainly not limited to calculating expectation values! If we want 
to calculate other statistical properties, perform other statistical operations, the density matrix 
certainly allows us to do so. If we find physical reasons for preferring other statistics, then those 
other statistics are what we should use. There is nothing intrinsically special to expectation values! 
Finally, to avoid cumbersome words, from now on we will refer to the expectation value of properties 
as the mean value. 

4.2 Isolated systems 

If, as is usual, we identify the Hamiltonian as the energy operator, then the mean energy of the 
system is 

(H) p = Tr[Hp] (53) 
We now consider how this mean value varies with time. 

4.2.1 Work 

For an isolatecH system, the density matrix evolves unitarily: 

i h^ = [H,p]_ (54) 



so the mean energy changes: 

8{H) p = l d H\ 
dt \ dt I 



(55) 

p 



3 Note that we do not take isolated to mean having a time independant Hamiltonian. We take isolated to mean 
only that there is no interaction Hamiltonian with another system. 
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Still back to basics, we might ask what this means. The left hand side is clearly the rate at which 
the mean energy of the system is changing. What of the right? 

Let us suppose that the Hamiltonian is a function of some parameters (x, y, z) that are varying 
in time: 

H = ^2 E n(x, y, z) \E n (x, y, z)) (E n (x, y, z) | (56) 

n 

The eigenstates can be rewritten 

\E n (x, y, z)) = T(x, y, z) \E n ) (57) 

so that the operator 
dH 

— = ^2x-VE n (x,y,z) \E n (x,y,z)) (E n (x,y,z) | + [x -Q,H]_ (58) 

where 

/ dx dy dz \ , . 

x - = {*•&*) < 59 > 

e = (»I^rt(x,„, z ),*IfcMTt( I , v ,,),»I^Tt( a: , !/ ,z)) ( 6 o) 

The first part of Equation [58] should be recognised as the generalised force V£ n (s, y, z) that comes 
from the change in energy eigenvalues due to a change in the parameters (x,y,z). The product 
with the rate of change of those parameters gives the rate of work against the force. 

The second part is slightly more subtle. If the energy eigenstates are varying, then, if we were to 
keep the energy eigenvalues and the state of the system fixed, the expectation value of the energy 
for that state would be changing. This term, therefore, represents the rate of work required to 
rotate the energy eigenstates. 

The term (a n | ^ \a n ) gives the mean rate of work required for state \a n ), so (^jf^} 1S j us t the 

mean rate of work required, given the density matrix p. 

So, the rate at which the mean energy of the system is changing is equal to the mean rate at 
which work is being performed upon the system. Given the systems is isolated, we should hope so! 

4.2.2 Adiabatic availability 

We now ask the question: how much work may be extracted from a given state p, by a cyclic 
variation of the system HamiltonianE] H? We will call this the adiabatic availability of the state 
p (see |GB91] [Chapter 5]). We assume that the system is completely isolated. Clearly the work 
extracted by a variation in the Hamiltonian is given by 

[ T 9(H) n 

As the system is isolated, the evolution of p depends solely upon H, and we require that H = H 
for all t < and t > r. This allows us to rewrite the result, rather trivially, as: 

W = -Tr \h U\t)p U(t) - H oPo \ (62) 



4 Note, here, that this is a cyclic variation in the Hamiltonian, not the state of the system. This is not, therefore, 
directly related to the Kelvin statement of the second law. 
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where 

Po = ^2K \n) (n | (63) 

n 

is the initial density matrix, 

ih^ = H{t)U{t ) (64) 

and U(0) = I. 

We would clearly like to extract as much work as possible from the system. Is there a limit to 
how much we can extract? If the initial Hamiltonian Hq is not bounded from below (i.e. does not 
have a ground state with finite energy) then the answer is, no, there is no limit. Excluding this 
case, the answer is, yes. 



4.2.3 Passive distributions 

Let us note that Equation 1621 expresses the difference between the mean energy of the initial density 
matrix po, and another density matrix which can be related to it by a unitary transformation U(t). 
To get the most work out, it is therefore necessary to vary the Hamiltonian so as minimise the 
mean energy of the state p(r) = W{t)pqU{t). As p(r) must have the same eigenvalues as po, it 
turns out that the minimum is just the state which is diagonalised in the energy eigenstates 

p{r) = Y,Vn\E n ) {E n \ (65) 

n 

such that 

Pm > Pn & E m < E n (66) 

A density matrix which satisfies these criteria is called a passive distribution. Intuitively it is clear 
that a passive distribution must certainly minimise the internal energy for a set of eigenvalues, if 
diagonalised in the energy eigenstates. If Equation [66] did not hold, for two given states, it would 
always be possible to reduce the mean energy (and extract work) by swapping those two states. 

We now show that density matrices that are not diagonalised by the energy eigenstates are not 
passive. Take a density matrix which is assumed to not be diagonalised in the energy basis: 

/0i = 5>;l A ;> (A, I (67) 

3 

with the ordering A, > Xj 44> i < j , and form the density matrix diagonalised in the energy basis: 

P2=J2 * \ E i) ( E i I = E ( E i I <°1 \ E i) \ E i) ( E i I ( 68 ) 

i i 

for which clearly Tr [Hpi] = Tr [Hp2[. Now compare p2 with the passive distribution 

p 3 = E A il^) ( E 3-\ ( 69 ) 

3 

As the eigenvalues are related by a doubly stochastic map 

3 

it follows [HLP34] that 

J2 M E i >■ J2 X E 3 ( 71 ) 

» 3 
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with equality possible if and only if the doubly stochastic map is a permutation. For the pi 
distribution to be passive, this would require an identity map, which is not the case by assumption. 
It follows that Tr [Hpi] > Tr [Hps] , but ps is clearly accessible from p\ by a unitary map. p\ is 
therefore not passive. 

For any given density matrix p and Hamiltonian Hq, there is a passive distribution p with the 
same eigenvalues. The adiabatic availability is then: 

A[p,H ] =Tt[H ( P -p)} >0 (72) 

Equality is reached only for passive distributions, which all have adiabatic availabilities of zero. 
The adiabatic availability is always uniquely defined although the passive distribution is unique 
only if the energy eigenvalues are non-degenerate. 

For an isolated system it is clearly the case that the work performed upon the system, in any 
cyclic variation of the Hamiltonian, must equal the change in adiabatic availability. For a non-cyclic 
variation in the Hamiltonian, this is not the case. If Hq — > Hi leads to po — > p\ then the work W 
and change in adiabatic availability AA are related by 

W = Tr[H lPl }-Tr[H p ] (73) 
AA = W - Tr [(ifx - # )po] (74) 

An explicit cyclic Hamiltonian capable of extracting the available energy is given (for < t < r) 

by: 



/2vrA 2ih . 2 fixt 



H{t) = H cos (_j - — sin 2 ( — ) In 



E \ E j) (A? 



(75) 



J 

We will note the following in passing: that if the energy spectrum is bounded from above, then 
there is also maximum amount of work that may be unitarily performed upon the system, and a 
set of distributions for which that maximum is zero. These distributions have the property: 

Pm < Pn O E m < E n (76) 

Unless otherwise stated, we will assume that the energy spectrum is not bounded from above, in 
which case these distributions will not have finite mean energy, and we will not consider them. 

4.3 Interacting systems 

We now move on to the situation where we have two systems that are allowed to interact for a 
period of time, so that H(t) = Hi <g> I2 + 1 1 <8> H 2 + Vyi- The combined system is isolated, except 
through the variation of the Hamiltonian. 
First we have 



(78) 



d(H) p d{Hi)- d(H 2 )- d{Vn 

dt m m ot 

(~dt) P 

with the middle line because the combined system is not interacting with any third system. We 
should not be surprised to see that the rate at which the mean energy of the combined system 
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changes, is equal to the mean of the rate at which work is performed upon the subsystems and the 
interaction between them. 
We also have 



ih ^0T- = ih \-dr) -Q[Hi)-Q[H 2 ) (82) 



where for convenience we define 



Q[H 1 }= {[H 1 ,V 12 ]_) n =Y,ih^^(a n \Hi\a n ) + ([H 1 ,e 1 }_)^ (83) 



P „ Ol \ i pi 



Q[H 2 ]= ([H 2 ,V 12 ]_) n =J2ih^^((3 n \H 2 \p n ) + ([H 2 ,e 2 ]^ (84) 

(85) 



— dt - /,-,> 



with 



Pi = ^2p(a n ,t) \a n (t)) (a n (t) \ = Tr 2 [p(t)] (86) 

n 

T2 = $3p(A»»*)lAi(t))(A»(t)l = Tri[p(t)] (87) 

n 

and 0i and ©2 are defined, as in Section [2.21 as the Hamiltonian operators 

M*)) = T a (t)\a n (0)) (88) 

ih djajt)_ = 0i(t)TQ(t) (g9) 

at 

\(3 n (t)) = T p (t)\p n (0)) (90) 

(92) 

The term Q [Hi] clearly represents the mean rate at which energy is flowing into system 1, in 
addition to work being performed upon it, and similarly for Q [H 2 ] and system 2. Remember also 
that if any two of Hi, 0i and ~pi commute, the commutator term is zero. 

We will now consider some simplifying conditions. 



1. Constant interaction potential 

dt 



If the only work being performed upon the joint system is through Hi and H 2 , then ^ft 2 - = 0: 



ifi— -^JL + Q [ Hl ] + Q [H 2 ] = (93) 



We now use the notation 



AX(i) = £ i Mkdt=(X{t)) m -(X(0)) m (94) 

DlX( t)] = h™&) « (95) 
Jo \ ot 
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to consider how the mean energy of the systems change after a finite period of interaction. 
AH gives the change in the mean energy of the system over the course of the interaction, and 
D [H] gives the mean work performed upon the system. 

If the interaction term is constant in time, then 



Ay i2 + /* Q [Hi] dt+ f Q [H 2 ] dt = 
Jo Jo 



(96) 



2. Finite interaction duration 



We now suppose that the systems are initially separated so that TV [Vi2(0)p(0)] = and that 
at time t they have been separated^! again Tr [Vi2{t)p(t)] = 0. 



AQ = / Q[H 1 ]dt = 
Jo 

AHi = D [Hi] + AQ 
AH 2 = D[H 2 ]-AQ 



Q [H 2 ] dt 



(97) 

(98) 
(99) 



AQ is the mean flow of energy between the two systems, during the interaction. 

3. No change to second system 

Finally we consider the effect of = : 

D [Hi] = AHi - AQ 
AH 2 = -AQ 



(100) 
(101) 



The interaction with the second system can allow energy to flow from system 2 into system 
1. If the work done on system 1 is negative, D [Hi] < 0, the energy flow can be extracted. 
This can still be true if the variation in Hi is cyclic, and even if AHi = 0. 



4.3.1 Completely passive distributions 

We complete the notion of adiabatic availability by noting the need for a stronger notion than 
passivity is required if composite systems are considered. Let us consider a joint system, consisting 
of a joint Hamiltonian: 

Hi 2 = Hi ® I 2 + I x ® H 2 



and a joint density matrix 
such that 



(102) 





P12 = Pi p 2 




(103) 


Hi 


= H E n 

n 






(104) 


H 2 


= J2 E n 

n 




(e^ 


(105) 


Pi 


= Y,Pn\ 

n 




(em 


(106) 


P2 


= Y,Pn\ 


4 2) ) 


(4 2) 


(107) 



The moving of the systems can be achieved by variations in the internal Hamiltonians of each, so this does not 
conflict with 2Ba = o. 
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Even if p\ is a passive distribution (and, by construction, so will p 2 be), the combined density 
matrix p\ 2 may not be a passive distribution. 

To show this, we need only consider three energy levels, and their probabilities: 

Ei < E 2 < E 3 (108) 

Pi > P2 > P3 (109) 

For the joint system to be passive it is necessary that 

(2E 2 -E x - E 3 )(( P2 ) 2 - PlP3 ) < (110) 

It is a simple matter to find values^ for which this fails and equally easy to find valued for which 
this holds. 

If a product of 2 equivalent passive systems, is passive, then the distribution may be termed 
2-passive. Similarly, if a product of N equivalent passive systems is passive, then the distribution 
may be termed A^-passive. A completely passive distribution (equivalent to the mutual stable 
equilibrium of [HG76J) is one that is A^-passive, for all finite N. 

The necessary and sufficient condition for A^-passivity is that, for all combinations of natural 
numbers {aj} and {bj} such that 

E^ = E^ = Ar ( m ) 

i 3 

then 

' £ OiEk < J2 bjEj ) <=► I ]JP^ > ) ( 112 ) 

V i 3 ) \ i 3 J 

To simplify this, consider just three levels, Ei < Ej < Ef., and bj = N. It is necessary that 
either 

NEj < (N - n)Ei - nE k 

(Pjf > (p l ) (N - n) (Pk) n (113) 

or 

NEj > (N - n)Ei - nE k 

(Pjf < {p>i) {N - n) {PkT (114) 

for < n < N. 

Now as NEi < NEj < NEk and (pi) N > (pj) N > (pk) N , there must exist real numbers 
< l(ijk)i m (ijk) < N such that 

NEj = (AT - l(ijty)Ei - l(ijk)Ek 

(pj) N = ( Pl ) {N - m ^Hp k r^ (us) 



Consider 

Ei = 1 E 2 = 3 E 3 = 4 
pi = 1/2 p 2 = 1/3 p 3 = 1/6 

Energy may be extracted by swapping the \E\Es) and the \E2E2) states. 
r E 2 = 2 
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For the system to be iV-passive, lujk) an d m (ijk) cannot be separated by any integers (as any such 
integer will yield an n for which the iV-passive conditions fail). This must hold for all (ijk) triples 
of energy levels. 

If we rewrite the above equations as 



= ^Jbt (116) 



then 



\n(pj) - Info) 

m ^ = n h^Th^) (117) 

^ ~ mfofc) " N VE^Ei ~ Hp^-Hp,)) (U8) 

Unless 

Ej - Ej = ln(pj) - ln(pj) 

E k -Ei ln( Pi )-ln(p k ) 1 ' 

it is clear that if N becomes sufficiently large then l{jjk) — m (ijk) > 1- When this happens the 
combined distribution is no longer passive, as there must exist an integer n between l(ij k ) and 
m (ijk)- 

To be completely passive, it must be the case that for each triple l(ijk) = m (ijk)- This leads to 

Mpi/Pj) _ HPi/pk) ^ 120 ^ 

Ej — Ei E}. — Ei 



For this to hold for any triplet {ijk) of energy levels, then 

]n(pi/pj) 



Ej — Ei 



(3 (121) 



where (5 is a constant. Further rearranging shows 

ln( Pi ) + (3Ei = \n{pj) + (3Ej = In A (122) 
where A is also a constant, giving^: 

Vi = \ e -P E > (123) 

for all i. It is a simple matter to verify this distribution is sufficient for TV-passivity. As the value 
of TV" has disappeared, it is apparent that the canonical distribution must be A^-passive, for all any 
N. It follows that the canonical distribution is the unique, completely passive distribution, for 
separable Hilbert spaces [PW781 ILen781 ISiw80] . 



4.4 Summary 

Does much of the above seem somewhat obvious? We should hope so! All we have done is to 
apply the normal rules of probability theory to Hamiltonian evolutions. We have derived average 
terms for the rate of change of energy, rate of work and the flow of energy between two interacting 
systems. We need make no reference to thermodynamic concepts to do this. There is no need to 
introduce approximations and the results apply to systems with any (finite) number of degrees of 
freedom. 

8 Note: passivity requires (3 > 0. 
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Remarkably, we have shown how there is maximum amount of energy, the adiabatic availability 
of the system, that can be extracted as work from an isolated system, in a specific type of cyclic 
process. Not all of the energy of a system is available for work. This conclusion can be drawn 
without needing to invoke notions of entropy or consider thermal heat baths or engines operating 
between them. 

Most remarkably we have a concept of passivity, that seems similar to thermal equilibrium, and 
a stronger concept of complete passivity, or mutual stable equilibrium, and the only completely 
passive distribution is the canonical distribution! Yet we have at no point referred to any thermal 
concepts, whether entropy, temperature or thermal equilibrium. 

There may be disagreement over when a probabilistic statement can be justified. There may be 
disagreement over what such a statement means. There may even be disagreement over what the 
value of the probability is. Whenever probabilistic statements are justified, the results given here 
follow. 

5 Statistical Temperature 

Having established the rules of statistical mechanics, we now need to see if it can account for thermal 
phenomena. We wish to avoid, as far as possible, assuming any of the structure of thermodynamics, 
and instead focus upon the physical phenomena. It is tempting to go from complete passivity to 
a Large Completely Passive Assembly, and then identify this with an environmental heat bath. 
We will continue to resist temptation! We still have not yet justified that the physical systems we 
characterise as thermal states are actually canonical distributions. 

To do this, we will deduce, from a set of observations, that there is only one possible way to 
represent thermal states in the context of statistical mechanics and that is the canonical distribution. 
No reference to ideas of entropy, heat baths, approaches to equilibrium, or information theory, will 
need to be used. The analysis here was inspired largely by [Szi25| . although the presentation differs 
quite significantly. 

5.1 Some properties of temperature 

Statistical mechanics has a far broader scope than thermal phenomena. To see how statistical 
mechanics deals with thermal phenomena we must first identify what thermal phenomena are, and 
how this restricts the description of the systems to which we wish to apply the methods of statistical 
mechanics. We do not derive the concept of temperature. Instead we consider temperature an 
empirically observed phenomena and ask what the theoretical description of such phenomena could 
be. What are the empirical properties we know about these thermal systems, and what constraints 
does this lay upon what physical states can represent them? 

Our approach will be to start by examining the phenomena of temperature, taking it for granted 
that we have some notion of temperature from our experience of things being hot and cold. We 
will not need to consider what it means for one system to be hotter than another, only what it 
means for two systems to be at the same temperature as each other. 

We will state the following properties of two systems that are at the same temperature: 

1. No spontaneous flow of energy. 

If two systems in isolation, are at the same temperature, then if they interact with each other 
there can be no mean flow of energy between them. Energy may be exchanged in individual 
systems, as fluctuations, but the expectation value for the exchange must be zero. 
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2. Composition. 

Temperature is not changed by combining systems at the same temperature. When two 
systems are each individually at the same temperature as each other, then the joint system 
that is formed by combining those two systems, is a system at the same temperature. 

These two observations are all we need to derive thermal physics from statistical mechanics. 
We will also show that the composition property may be replaced by the following two conditions: 

2. (a) Transitivity. 

Temperature is transitive between systems. When two systems are each individually at 
the same temperature as a third system, then they are at the same temperature as each 
other. 

(b) Universality. 

The property of being at a particular finite temperature, can hold for every possible 
system. So, for any given Hamiltonian, there exists at least one distribution that corre- 
sponds to each temperature. 

5.2 Deriving the temperature distribution 

We wish to regard the previous statements as providing a set of empirical observations that we are 
going to use to deduce how thermal states needs to be treated in statistical mechanics. 

1. No spontaneous flow of energy. 

First we must clarify what is the observed phenomena we are proposing, and secondly, how 
that can be represented. 

In general, when two objects, initially non-interacting, are brought into contact, and allowed 
to interact through that contact, when they are separated their states have changed. Careful 
calorimetry experiments, based upon the work required to effect the equivalent changes to 
those systems, when isolated from each other, allows us to identify a quantity of energy that 
was exchanged between the two systems. 

When systems are at the same temperature, the quantity of energy exchanged is zero. This 
needs two qualifications. When the measurements are sufficiently sophisticated to include 
fluctuation phenomena, the energy exchanged in any one instance, of contact between two 
systems, may be non-zero. The energy exchange is still zero on average. Also, we must 
exclude interactions that can cause chemical or nuclear reactions. To do this we require the 
output density matrices to have non-zero eigenvalues only in those regions of state space for 
which the input density matrices have non-zero eigenvalues. 

Before contact, the systems are separated, and so described by a product density matrix with 
a non-interacting Hamiltonian: 

Po = Pi® P2 

H = Hi <g> I 2 + h ® H 2 (124) 

In principle we could allow an interaction Hamiltonian Vu subject to Tr [Vfopo] =0. 

The two systems are now brought into contact. This can happen in two possible ways: the 
interaction Hamiltonian V±2 is changed, so it becomes non-zero for poi or a t least one of the 
system Hamiltonians is changed so that po itself evolves into a state p for which Tr [V12P] / 0. 
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We will represent both cases by simply assuming overall a time varying Hamiltonian, H(t), 
such that H{t) = H^s/t < 0. As the interaction is of a time limited duration, we also require 
there exist a time r for which H (t) = HoVt > t. 

We now want to focus on the concept of a spontaneous flow. We cannot take this to mean 
in the absence of all interventions, as we are having to intervene to bring the systems into 
contact, then separate them. This involves a time variation in the Hamiltonian, and the 
possibility of work being performed upon the system. For a flow of energy to be spontaneous, 
therefore, we add the restriction that the net work performed upon the joint system, over the 
course of the interaction, be zero. 

We can summarise this as follows: 

U = e l J° H ® 

W = Tr [H (p T -p )} 
AEi = Tr [Hi <8> h (jh ~ Po)] 

AE 2 = Tr [h ® H 2 {p T - Po)] = W - AE 1 (125) 

A necessary condition for p\ and p 2 be at the same temperature as each other is that, for 
all H(t) such that W = 0, then AE\ = 0. In other words, if the two systems are at the 
same temperature, then provided no work is performed upon the joint system, there can be 
no exchange of energy between the two systems. 

We can also see that, for this condition to be met, it is necessary to consider only the overall 
unitary evolutions U, and not the detailed interaction H(t). If we rewrite the condition as: 
for pi and p 2 be at the same temperature as each other, then for all U for which W = 0, 
it must be the case that AEi = 0; we can immediately note that the system po must be a 
passive system. 

If po is n °t a passive system, then it is possible to extract its adiabatic availability as work. 
We can then apply that same quantity of work, to either system, in isolation. The net work is 
zero, but energy can be exchanged. It follows that if po is not passive, there exists a unitary 
evolution that violates the conditions for p\ and p 2 to be at the same temperature, as a 
Hamiltonian interaction may always be constructed to implement the unitary evolution. 

2. Composition. 

If two systems are individually at the same temperature T, then the combined system is also 
at that same temperature T. This is a surprising property. It is not directly deducible from 
the more familiar transitivity property of temperature, but certainly embodies one of our 
intuitive notions of what it means for two systems to be at the same temperature. 

Let us express the concept of 'being at the same temperature' as a relationship so that 
pi ~ p 2 means p\ and p 2 are at the same temperaturfi The compositional property states: 



Pi ~ P2 Pi ~ pi <S> pi (126) 

We also know, from the first property, that if p\ ~ p 2 , then p\®p 2 is passive. As pi ~ p±, from 
induction it follows that p\ (g> p\ is passive, for all N. It follows for p\ to be a temperature 
state, that it must be completely passive, and hence a canonical distribution. 

The relationship is clearly symmetric, so that p\ ~ p2 <4> p2 ~ pi and reflexive, so that pi ~ pi. 
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This compositional property of thermal states is rarely emphasised, although it plays a very 
strong role in our intuitive sense of what is required of a state, for it to be thermal. The transitive 
property of temperature is more usually encountered. Can transitivity be used to deduce the 
canonical distribution? 

2. (a) Transitivity 

The transitivity of temperature is closely related to the operational requirement that 
any two systems, when usable as thermometers, must agree when systems are at the 
same temperature. When applied to statistical states, it defines a necessary, but not 
sufficient, requirement for the states to be considered at the same temperature. 
The first requirement gives us a necessary condition for p\ and p2 to be at the same 
temperature: that p\ ® p2 be passive. Suppose there is a third system p%. The fact that 
pi (g) P3 and p2 <8> P3 may both be passive is not sufficient to ensure p\ ® p2 is passive. 
Transitivity is, therefore, a further restriction. 

However, if p\ ® p2 is not passive, we are unable to say whether it is p\ or p2 (or neither) 
that could still be regarded as being at the same temperature as p^. All that we can say 
is that, for a collection of distributions, {p±, . . . , p n }, to be all at the same temperature, 
T, it is necessary that every combination pi®pj be a passive distribution. If we introduce 
a new distribution, which is only jointly passive with some of the original collection, is 
it the new system that is not at the same temperature, or was it our original collection, 
that was not a true collection of distributions at the same temperature? 

(b) Universality We now introduce the assumption that temperature be universal. By this 
we mean that, for any system Hamiltonian, and any finite temperature, there is at least 
one distribution over the energy spectrum of that Hamiltonian, that corresponds to that 
temperature. 

In other words, under the assumption of universality, given a collection, {p±, . . . , p n }, of 
distributions at the same temperature, and any Hamiltonian, H n+ i, of a new system, 
then it must always be possible to find a distribution p n +\ over the new system, such 
that pi ® p n +i is passive, for all 1 < i < n. 

We can use these properties to establish that the ratio of the probabilities of any two energy 
levels must be a single valued function of their energy difference, and the temperature. 

First suppose we have a given system, with two energy levels separated by the gap E\ — Eq = 
A, and the ratio of whose probabilities is given by pi/po = H. If we consider two levels, i and 
j, of any other system, then comparison of the \E\Ei) and \EqEj) levels yields: 

Ei - Ej > A 4» — < n (127) 

Pj 

Now we consider all Hamiltonians that contain levels with the energy gap A. This will yield 
maximum and minimum values of n for that energy gap, and so: 

Ei — Ej > A 44> — < H m i n 
Pj 

Ei — Ej < A o ^ > U max (128) 
Pj 

As we can do this for all values of A, we generate two functions n maa; (A) and n m j n (A). A 
little thought shows, that given for all A' > A, then H max (A') < n m j n (A), these functions 
must both be piecewise continuous. 
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We next demonstrate that H max (A) = II m j n (A). Consider a Hamiltonian with four non- 
degenerate energy levels, for which the energy gaps between the lowest and the three higher 
are A — 5%, A — <5 2 and A + 83, respectively, such that 5\ > 82 and Si, 82, 83 > 0. The inverse 
ratio of the probability of the lowest of the energy level, to the A — 5\ energy level, is EEi, to 
the A — 82 energy level, is II2, and to the A + £3 energy level, is II3. 

Now consider the product of two systems with the four energy levels. Comparing the |i?2, E2) 
with the \Ei,E s ) levels, if 5 3 - Si > 28 2 then (n 2 ) 2 < nin 3 . 

By the definition of U max (A) and U m i n (A), as we vary through Hamiltonians so that 81,82,83 — 
0, we have 

ni,n 2 -» u max (A) 

n 3 -> n min (A) (129) 

Provided we maintain 83 — 5% > 282, such as by 82 = \ (S3 — Si), then 

Il max (A) 2 < U max (A)U min (A) (130) 

which gives U max (A) < n m j n (A), but U max (A) > U min (A), by definition, so U max (A) = 
U min (A) = 11(A). 

We now have demonstrated that the property of universality of temperature, requires that 
there exists, for each value T, of temperature, a function ITy(A), such that 

P i = U T (E l -E,) (131) 
Pj 

It is a simple matter to deduce the unique function that satisfies this. Considering a third 
level 



^ = n (e — e ) = UT ^ Ei - Ej> 

Pj y k 3> U T (Ei-E k ) 



U T (E k -Ej) = ^ (132) 

Writing Ai = E k — Ej and A 2 = E{ — E k , gives 

n T (Ai + A 2 ) = n T (A!)n T (A 2 ) (133) 

We find a variation on the Darboux relationship [Dar80j, which has the solution: 

— = e-^m-Ej) (134) 

Pj 

We can rearrange this 

p . e P{T)Ei = pje m^ = X (135) 
where A is a constant, which by normalisation of the probabilities, is 

Transitivity therefore yields the canonical distribution, but only if we supplement it with the 
requirement that temperature be universal. 



26 



5.3 Comments 



The concept of passivity is clearly closely related to the concept of equilibrium. However, we 
have not made any assumptions regarding whether systems, not in equilibrium, must evolve into 
a state of equilibrium, or whether systems, in equilibrium, may spontaneously be found out of 
equilibrium. The temporal asymmetry associated with the concept of equilibrium[BU01], which 
is so problematical for the relationship of statistical mechanics to thermal physics, has not been 
assumed. 

It appears we do not need to assume that isolated systems tend to equilibrium to understand 
what a thermal state must be. The only sense of equilibrium that we may be argued to have 
used here, is a sense in which two states at the same temperature may be regarded as being in 
equilibrium with each other. This is a relationship between systems, not property of individual 
systems. 

Having said that, if the result of thermalising a system did not lead to a passive distribution, 
then clearly we could create perpetual motion: simply isolate the thermalised system, extract the 
available energy, then restore thermal contact and allow the system to return to the thermalised 
state. 

Another sense of equilibrium, that a distribution is in equilibrium if it is constant in time, also 
leads to the conclusion that the distribution must be diagonalised in the energy eigenbasis. Of 
course, we have seen that the concept of passivity is also sufficient to deduce the density matrix 
diagonalises the energy eigenbasis. 

That the compositional property rules against the micro canonical distribution was noted, in 
passing, by Gibbs[Gib02] [pg. 170-4]. It is key (although not obviously so) to Szilard's derivation 
of the canonical distribution for classical systems [Szi25 1. 

It is the factorisability of the joint probability distribution of the combined state that leads 
uniquely to the canonical distribution. This is deeply related to the development of the statistical 
mechanical account of thermal phenomena. Non-extensive entropies must, at least implicitly, deny 
this property. Both the extensivity of Gibbs entropy and the additivity of Shannon information are, 
formally, closely related to this. The extensivity of entropy is a somewhat abstract concept, which 
has something of the status of convention even in classical phenomenological thermodynamics. 
Shannon information has no obvious a priori relationship to thermal phenomena and has played no 
role in this derivation. The compositional property of temperature is a verifiable physical property 
of thermal states. 

5.4 The Ideal Gas Scale 

The temperature scale is largely a matter of convention. Consistency amongst different opera- 
tionally defined temperatures requires that any two thermal states considered to be at the same 
temperature with respect to one scale, are at the same temperature with respect to any other scale. 
It is usually also assumed that the primitive ordering of hotter than/colder than is also preserved 
across all temperature scales. This would mean that any temperature scale can be expressed as a 
monotonic function of any other scale. 

We could choose to use f3(T) as the definition of our statistical temperature scale. Provided (5 is 
a single valued function, it satisfies the consistency requirement. To verify the ordering relationship, 
we will identify a particular operationally defined temperature scale and find how f3(T) varies with 
that. 

We will make the choice of the ideal gas scale. It is empirically observed that, for a number of 
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gases, the relationship: 

PV = NRT (137) 

appears to hold, where the number of moles of the gas is N and R is the molar gas constant. It is 
hypothesised this holds exactly for ideal gases. 

Standard textbook analysis of a canonically distributed system, gives us the result that an ideal 
n quantum gas confined to a box of volume V, exerts a mean pressure P on the walls of the box of: 

PVf3(T) = n (138) 

The Boltzmann constant k = RN/n gives 

0(T) = -L (139) 

As this is a single valued, monotonic function of T, it follows (3(T) is a good temperature scale, 
provided the ideal gas scale, T, is a good temperature scale. 



6 Statistical Thermal States 

We have identified thermal states as uniquely represented in statistical mechanics by canonical prob- 
ability distributions. The canonical probability distribution is characterised by a single parameter, 
P(T), which can be related to the reciprocal of the ideal gas scale. We will now consider what we 
can deduce, solely from the identification of thermal states as canonical probability distributions, 
using the techniques of statistical mechanics. 

6.1 Mean flow of energy 

We can state immediately the consequence of Equation HH 

Afli(/%-0i)>O (140) 

If two thermal states interact, with different parameters (3\ and /?2, then if f3\ > the mean flow 
of energy, AH± < 0, can only be from system 2 to system 1. 



6.2 Thermal cycles 



Next consider a system, with an arbitrary probability distribution, p(°>, initially uncorrelated or 
interacting with any other system. The system is now brought into successive contact with a series 
of systems in thermal states, where the state of system % is parameterised by After each contact 
has ceased, the expectation value for the interaction energy with the i th system is zero. The internal 
Hamiltonians for the canonical states systems are constant in time. 



Let be the marginal probability distribution of the system after interacting with the 



;th 



thermal system. Let Pi(Pi) be the initial probability distribution of the i th thermal system, and p\ 
be the marginal probability distribution afterwards. Let AHi represent the mean energy flow into 
the i th thermal system. We do not assume any of the systems are canonically distributed after the 
interaction. 

Now, purely from the mathematical properties of canonical distributions (Section [2J) we can 
state: 



G 



(i-i) 



G[Pi. 



+ G[ P 0i)] 
-G[ P 0i)] 



> 
> 



G 



PiAH, 



+ G[ P [] 



(141) 
(142) 
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Adding these together and summing over all interactions we get 



G 



-,(0)1 



G 



(143) 



from which it follows that, if the series of interactions is such that it returns the systems final 
marginal probability distribution to its initial probability distribution = p(°) then: 



£&A#* >0 



(144) 



No physical interpretation need be placed upon the Gibbs-von Neumann measure G, to derive these 
results. We are deducing a property (Equation 1144ft of canonical distributions, under Hamiltonian 
evolution. 

We have not needed to assume that the system Hamiltonian returns to it's initial value. If this 
is the case, the mean work performed over the course of the cycle is 



AW = J2 A ^ 



(145) 



6.3 Statistical thermalisation 

Now consider a system with an arbitrary probability distribution, p(°\ initially uncorrelated or 
interacting with any other system, and a Hamiltonian H. The system is now brought into successive 
contact with a series of systems in thermal states, where each system has the same parameter (5. 
Let /tjw be the marginal probability distribution of the system after interacting with the i th thermal 
system, and let p{(3) be the thermal state for the system with parameter f3. 



G[p(P)]+P(H) 



< G 



+ P(H) o(i+1) <G 



+ /3{H)m<G 



+ P(H), 0) (146) 



As each equality holds only if the system is canonically distributed p{(3) we conclude that, if 
there is no physical cause that prevents it, a system with any arbitrary probability distribution 
can be brought arbitrarily close to a canonical probability distribution, with the parameter j3, by 
a sufficiently large number of such contacts. The system becomes thermalised. 

Possible physical causes that prevent complete thermalisation include: 

1. Transitions are not permitted between different regions of the state space. Let the different 
regions of state space be represented by the complete set of non-overlapping projectors Ki 
onto those regions, the partial thermalisation of the density matrix p, will be: 



p'((3) = J2 r K[KipK i 



-f3KiHKi 



X r [ e -f3KiHKi 



(147) 



Only part of the region of the state space represents the system being in thermal contact (i.e. 
the interaction Hamiltonian is zero for some portion of the system state space) and there are 
no transitions out of that region. If K a projects onto the isolated region, and Kp onto the 
thermal contact region, partial thermalisation will lead to: 



p'{(5) = K aP K a + Tr [KppKp] 



e -(iK p HKp 



(148) 
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At the risk of getting repetitive, we restate: no physical interpretation is placed upon the Gibbs 
measure G to derive these results. It is used simply to establish general mathematical properties of 
the evolution of distributions under Hamiltonian evolutions and of interactions with thermal states. 
The properties of these interactions can be understood without needing to physically interpret the 
Gibbs measure. They are properties of Hamiltonian evolutions. 

6.4 Heat Baths 

It will now be convenient to identify heat baths. A heat bath is simply a large system, with many 
degrees of freedom, in a thermal state. With many degrees of freedom it can be treated as having 
a large number of subsystems. No work is ever performed upon a heat bath. 

When a system interacts with a heat bath, it generally interacts only with one of the subsystems. 
What happens following that depends upon the details of the heat bath, the interactions between 
the subsystems and whether continued interaction with the heat bath involves continued interaction 
with the same, or a different, subsystem. We will use the symbol AQ to refer to mean energy flows 
into a heath bath, and will refer to these as mean heat exchanges. 

6.4.1 Ideal Heat baths 

An 'ideal' heat bath, which is one for which the subsystems are non-interacting, and an individual 
subsystem is never encountered twice. All the subsystems are canonically distributed with the 
same (5 parameter, and it is assumed there are no internal microscopic correlations. This means 
a system brought into contact with an ideal heat bath experiences a succession of contacts with 
independant canonically distributed systems at the same (3 parameter. It will become thermalised. 
An ideal heat bath may be be treated as a Large Uncorrelated Canonical Assembly (Section 12. 5p 
which is identical to a Lar ge C ompletely Passive Assembly. The interaction with each subsystem 
will be for a very short tima 10 !. From the completely passive properties of thermal systems, we can 
state immediately it is impossible to extract work from any number of ideal heat baths at the same 
temperature. 

6.4.2 Real Heat baths 

Of course, as soon as the systems have interacted, correlations develop and do not disappear. 
Subsystems of real heat baths interact with each other. Subsystems may be re-encountered. 

To judge the consequences of these requires real models of physical heat baths. For example, one 
property of weakly interacting subsystems [Par89] is a tendency for correlations to become "spread 
out", so that the correlation between the system and the heat bath subsystem is reduced by the 
weak interaction amongst the heat bath subsystems. 

Real heat baths do not behave exactly as ideal heat baths. The properties we are going to 
derive based upon ideal heat baths will not, therefore, be strictly applicable to interactions with 
real heat baths. The extent to which the behaviour of real heat baths differs from that of ideal 
heat baths can only be decided by examining physical models of the real heat baths. 

That real heat baths do not behave exactly as ideal heat baths is not a fundamental problem for 
statistical mechanics. Statistical mechanics should not be required to prove real heat baths behave 
as ideal heat baths - after all, they don't! When considering real heat baths, with real physical 
interactions, statistical mechanics is required to accurately describe the actual behaviour of those 

10 Care needs to be taken regard a limiting case of infinitesimally short interactions or the quantum Zeno effect will 
prevent thermalisation at all. 
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real heat baths, including how they deviate from being ideal heat baths, given the appropriate 
description of the real physical interaction. 



6.5 Statistically isothermal operations 



We now consider the limiting cases of interactions with an ideal heat bath. When a thermal 
system, with internal Hamiltonian H is brought into contact with an ideal heat bath with a fixed (3 
parameter, the system is kept in a thermalised statJ^l. The resulting canonical distribution gives a 
density matrix which is always diagonalised with respect to the system Hamiltonian. As the system 
Hamiltonian is varied, the mean work performed is 
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and the mean heat exchanged is 

A(Q) 

The work may be re-expressed as 
where 
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dt 
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Z(0). 



Z(t) = Tr 



,-PH(t) 



(149) 
(150) 



(151) 



(152) 



(153) 



7 Statistical Entropy 

The results of the previous section have not, at any point, depended upon the identification of — G 
with thermodynamic entropy. The Gibbs-von Neumann measure of a distribution has been treated 
simply as a convenient calculation tool, and has not been attributed any physical significance. 
The "entropy-like" qualities of G that have been used are simply mathematical properties of any 
distribution function under Hamiltonian evolutions. No physical interpretation has been placed 
upon them, nor was needed to use them. 

Nevertheless we have managed to derive the inequality 

E ^ > ( 154 ) 

i 1 

where Qi is the mean heat flow, into a thermal system at temperature Tj, over a closed cycle. The 
mean work performed over the course of the cycle is 

AW = J2^Qi (155) 

i 

Consider a few special cases of this, for heat baths: 
1 Provided the Hamiltonian is varied only slowly. 
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1. If there is a single heat bath: 

AQ > (156) 
the mean flow of energy must be into the heat bath. 

2. If there are two heat baths, and the mean work requirement for the process is zero, AQi = 
-AQ 2 = AQ: 

A<3(^-^)>0 (157) 

AQ > if and only if T\ <T 2 . The mean flow of heat into the first heat bath can be positive 
only if the first heat bath is colder than the second. 

3. If there are two heat baths, 

AW To 

AQ^ 1 -^ < 158 > 

Which (with due regard for changes in sign to both AW and AQi) shows the maximum 
efficiency, in terms of mean work extracted over mean heat extracted, for a heat engine. 



7.1 Thermodynamic entropy 

We will now, for the first time, consider phenomenological thermodynamics. The primitive exposi- 
tion of the concept of thermodynamic entropy here clearly lacks the careful rigour of such works as 
[GB91, LY99] and indeed differs greatly from them. It is closer to such textbook expositions such 
as |Fer37[ IAdk 68] . Our reasons for this are simple: it is not obvious what the statistical mechanical 
generalisation of thermodynamic entropy should be and there is no universal agreement on what 
properties of thermodynamic entropy are the ones to select in developing this generalisation (or 
even if such a generalisation is necessary). Our approach will be to focus upon the arguments that 
typically motivate supposing that there is such a thing as thermodynamic entropy in the first place 
and see how these arguments apply to statistical mechanics. 

The three cases in the previous section may be compared to three versions of the Second Law 
of Thermodynamics 

1. No process is possible whose sole result is the extraction of heat from a heat bath and its 
conversion into work. 

2. No process is possible whose sole result is the transfer of heat from a colder to a hotter heat 
bath. 

3. No process is possible whose sole result is the extraction of heat Q\ from a heat bath at 
temperature T\ and the deposit of heat Q 2 in a heat bath at temperature T 2 <T%, extracting 
the remainder W = Q\ — Q 2 with efficiency r/ = W/Q\ greater than 1 — T 2 /T\. 

An equivalent expression of the Second Law would be: 

4. No process is possible, whose sole result is the use of work to transfer heat between heat 
baths, such that the heat deposited in the n th heat bath, at temperature T n , is Q n and 

£^<0 (159) 



Now, the final expression leads to the following result: 
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If there exists a process, which takes a system from state A, to a system at state B, 
depositing heats Qm in heat baths at temperatures Tm ^ , then there can be no process, 

(2) 

which takes a system in state B, to a system in state A, depositing heat Q n in heat 
baths at temperatures Tn\ unless 

{1) {2) 
E^)+E^)>0 (160) 

m J-m n In 

As this result must hold for any two states A and B in any combination of processes, this is 
equivalent to the statement: 

There exists a single valued property, S, of a state such that, if there exists a process, 
which takes a system from state A, to a system at state B, depositing heats Qm in heat 
baths at temperatures Tm ^ , then 

(1) 

S[B)-S[A]>-J2^ (161) 

The value of this property S is not yet uniquely defined. There may be many functions which 
satisfy this requirement. 

If there also exists a process, which takes a system from state B, to a system at state A, 
depositing heats in heat baths at temperatures Tn\ such that 

E% + E^J = (162) 

m J-m n In 



then S is uniquely defineco by: 

o (1) 

S[B]-S[A) = -J2^% (163) 

m T m 

Identifying the value of S for all states then requires us to find processes in both directions which 
are able to complete the cycle with the requisite minimal transfer of heat to heat baths. 
If we can identify the value of S for all states then it follows 

There is no process which takes a system from state A, to a system at state B, depositing 
heats Qm^ in heat baths at temperatures Tm \ for which 

S[B}+Y,^<S[A] (164) 

m T m 

and finally, if we identify the change in the value of S for a heat bath with Q/T, we get 
There is no process for which 

^AS<0 (165) 
The quantity S is the thermodynamic entropy of the state. 



12 Up to an additive constant, of course! 
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7.2 Statistical mechanical generalisation of thermodynamic entropy 

The first thing we may wonder is whether there is any need to introduce the concept of a statistical 
mechanical entropy. The existence of the non-decreasing function of state is equivalent to the 
various operational statements of the second law, about the absence of certain kinds of processes. 
One of these statements must be introduced into the axiomatic structure of thermodynamics to 
be able to deduce results as, whichever statement is chosen as the appropriate second law axiom, 
otherwise it cannot be deduced. 

For statistical mechanics, we can deduce the statistical equations from the properties of Hamil- 
tonian dynamics, probability calculus and the existence of thermal states. There is no need to 
introduce a new axiom. Nevertheless, the tremendous utility of the thermodynamic entropy func- 
tion in developing phenomenological thermodynamics should suggest to us that such a function may 
be useful. Perhaps it may be possible, in principle, to develop phenomenological thermodynamics 
without introducing an entropy function, but instead, for example, rely solely upon the Kelvin 
formulation of the second law, but it would seem needlessly difficult to do so. 

If we decide to introduce such a function, let us consider the phenomenological laws which 
motivate introducing it. The first thing to note is that they are all false. It is possible to have 
processes whose sole result is to convert heat to work (just with probability less than one). It is 
possible to have processes whose sole result is to transfer energy from a colder to a hotter heat bath 
(just with probability less than one). 

This has a profound consequence for the development of a statistical mechanical entropy. The 
justification for introducing entropy as a single valued property of state comes from Equation 11601 
above. This justification does not hold for microstates. 

Hamiltonians exist which can transform any microstate into any other microstate, while ex- 
tracting arbitrarily large amounts of heat from a heat bath and converting it into work, so long as 
we are prepared to accept an arbitrarily low probability of the process occurring. No single valued 
entropy function could be deduced from this attempt. 

Attempting to fix this by demanding that the process can occur with certainty, we find that 
there is always some Hamiltonian evolution on the state space, which can perform a transformation 
between any two given microstates, without any exchange of energy with a heat bath. The same 
conclusion is reached even if we demand only that the mean transfer of heat to the heat baths be 
zero. The entropy difference between any two microstates is zero. 

7.2.1 Defining the statistical mechanical entropy 

The steps that might lead us to try to deduce the existence of an entropy function as a function 
of the microstate of a system are flawed. However, there was no reason to take those steps. The 
search for new axioms to introduce, to represent a statistical mechanical second law, is unnecessary. 
Statistical mechanics has already enabled us to deduce the property: 

No process is possible, starting with any probability distribution over a system, whose 
sole result is to return the system to it's a marginal probability distribution equal to it's 
initial distribution, and to transfer mean quantities of energy AQi into systems initially 
uncorrelated and canonically distributed with parameters Pi, where 

^2 faAQi < (166) 

i 

Note that this deduction, as expressed, is a direct consequence of Hamiltonian dynamics and does 
not depend upon any identification of thermal states, temperatures or heat baths. 



34 



Accepting the identification of thermal states as canonical distributions, and the state of a 
system being here defined as a Hilbert space, Hamiltonian operator on that space and probability 
distribution over the space, it follows: 

There exists a single valued property, S, of the states of systems such that, if there 
exists a process, which takes a system from state A, to a system at state B, depositing 
mean heats Qm in thermal systems at temperatures Tm\ then 

o (1) 

S[B]-S[A)>-J2% (167) 

We define this property as the statistical mechanical generalisation of thermodynamic entropy. We 
will now calculate its value. 



7.2.2 Deriving statistical mechanical entropy 

In order to fix the entropy difference between two states, it is necessary to find processes between 
the states in both directions, where: 

E§T)+E%)=0 (168) 

m J-m n J-n 

For phenomenological thermodynamics, this involves reversible, quasistatic processes. These pro- 
cesses do not actually exist in reality. They are limiting processes - they are not necessarily 
attainable, but there is no physical reason one cannot get arbitrarily close to them. However, they 
are generally considered to only be possible for systems in thermal equilibrium. 

In statistical mechanics we have a more general notion of process available: Hamiltonian dy- 
namics. From this we can construct reversible processeJ^l. as a limiting case. 

1. Isothermal processes 

We first identify the entropy change for an isothermal process. Taking an ideal heat bath as 
the limit of a real heat bath, we assume we can get arbitrarily close to an ideal heat bath 
process. An isothermal process in contact with a single heat bath at temperature T, requires 
mean work equal to: 

-Z(rY 



A(W) = -kThx 

where 



(169) 



Z(0). 

Z{t) = Tr e -^«/ fcT j (170) 



The mean change in energy is 

AE = Tr [H(t)] - Tr [H(0)] (171) 
The mean heat transferred to the heat bath is 

AQ = AW - AE (172) 
Substituting E n = — kT In Z — kT In [p n ] for canonical thermal systems gives 

AQ = kT (5>„(r)lnb n (T)] - $> n (0) In [p n (0)} ) (173) 



13 Please note, the specific Hamiltonians used are provided solely to demonstrate the existence of concrete examples. 
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2. Non- isothermal processes 

While the above relationship may be well known for quasistatic, isothermal processes, we 
must consider a more general process. Suppose we have a system that starts in an arbitrary, 
uncorrelated state 

P = ^Pn \a n ) (a n \ (174) 

n 

with Hamiltonian H^. We wish to find the entropy difference with another state 

P' = Y,Pn\Pn) (Pn\ (175) 
n 

with Hamiltonian Hf. We need to find a reversible Hamiltonian process to achieve this. 
We break the evolution into three stages, and give the Hamiltonian for each stage. 

(a) < t < Ti 

Keeping the system isolated 
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The effect of this evolution is to leave the system in the canonical state 
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(176) 
(177) 

(178) 



(b) Tl < t < T 2 

Bring the system into thermal contact with a heat bath at temperature T, and isother- 
mally, quasi-statically change the Hamiltonian to 



H 2 = J2-kTln(p' n ) \ ln ) ( 7n | 

n 

At the end of this process, the system is in the state 

e -H 2 /kT 

P = Tr [e-tfa/fcT] = ^l p 'n 1^) (T" 

(c) T 2 <t<Tf 

Isolate the system again and the final Hamiltonian is 



(179) 



(180) 



Hf> = Ho 



irt 



irt 



cos 



+H f 



2(T f -T 2 ) 

( irt 



sin 



T f~ T2 , 
( irt 



sm 



sin 



2(r/ - r 2 ) 

which produces the desired final matrix p' 



Tf — T 2 



2ih 



irt 



Tf — T 2 



sm 



Tf — T 2 



In 



(181) 
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Stages (a) and (c) are isolated, so involve no heat exchange. Changes in internal energy are 
entirely through work performed upon the system. In the limit of ideal heat baths and slow 
isothermal processes the net heat exchange is 

AQ = kT (y^p' n In [p' n ] -^2p n ln [p n ]\ (182) 

\ n n J 

The limiting process can clearly take place in either direction. We can therefore deduce that, 
for any two density matrices p and p', the difference in the statistical mechanical generalisation of 
their thermodynamic entropy is: 

S [p'] -S[p] = -k(j2Pn ^ Wn] ~ E Pn ln fan]) (183) 
\ n n / 

As this relationship 

S [p'\ + k Y.Pn In [p'n] =S[p] + kJ2 Pn In \p n ] (184) 

n n 

must hold for any density matrices, and with any eigenstates, then 

S[p] = -kTr[pln[p]] + c (185) 

where c is a universal additive constant which can be set to zero by convention. The Gibbs-von 
Neumann entropy is deduced to be the correct generalisation of the thermodynamic entropy, for 
statistical mechanics. 



7.3 Optimal processes 

The physical principle expressed by the Gibbs-von Neumann entropy is: 

There is no process which takes a system from state A, to a system at state B, depositing 
mean heats Q m into thermal systems at temperatures T m , for which 

S[B}-S[A]+J2^<0 (186) 

The limiting process, where 

S[A]=S[B]+J2^ IL (187) 

TO J ™ 

is the optimal process. It is the process, which, on average, generates the least heat. 

The thermal systems into which heat is transferred are not necessarily ideal heat baths. All 
that is assumed is that they are initially uncorrelated to other systems and that they are described 
by a canonical probability distribution. 

There is no special reason why the mean generation of heat is important. Other criteria may 
be considered. In some physical circumstances other properties might be more important. One 
might wish to find the optimal process according to some other criteria, such as a minimax criteria 
(minimising the maximal cost). Statistical mechanics provides the tools for doing this. 

If we search for a process which minimises the mean generation of heat in thermal systems, 
we are lead to the statistical mechanical entropy as the quantity which characterises the optimal 
process. If we search for a process by some other criteria, we will find different quantities of interest 
and different processes. These would, necessarily, involve at least as much heat generation, on 
average, but would outperform the optimal entropic process according to some other criteria. 
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7.4 Consistency 

Can we be sure this definition of thermodynamic entropy is consistent? Once the identification 
of statistical temperature and the gas scale has been made, it is possible to derive the entropy 
relationships directly from the property of the Gibbs-von Neumann measure G, and the properties 
of canonical distributions. It holds for all processes, because of two properties of Hamiltonian 
evolutions: 

1. For any process, which starts with uncorrelated distributions over a number of systems, the 
change in the value of G for each marginal distribution gives 

E A ^<° ( 188 ) 

i 

2. For any system, initially canonically distributed 

AG + (3A (H) > (189) 

We did not go directly from these properties to the identification of — kG with the thermody- 
namic entropy as we wished to justify precisely what purpose thermodynamic entropy is intended 
to fulfil. Having done so, and demonstrated that — kG is the correct value, we can now see that 
the result 

E A ^>° ( 19 °) 

i 

must hold, provided that the evolution is Hamiltonian and that initially independant systems are 
uncorrelated. 

7.5 Non-equilibrium statistical mechanics 

The entropy has been derived for arbitrary probability distributions, not only for systems in thermal 
equilibrium. Phenomenological thermodynamic entropy is frequently regarded as only being well 
defined for states in thermal equilibrium. How is it that statistical mechanics could do better? 

There is a subtlety involved in the temperature in Equation 11591 This gives the temperatures 
of the thermal systems, into which heat is transferred. It is not, directly, the temperature of 
the system from which heat is being expelled. It is the temperature of the heat baths, not the 
temperature of the system undergoing the cyclic process. The phenomenological thermodynamic 
entropy function, defined by Equation 11611 is therefore defined for all states, whether they are in 
thermal equilibrium or not. 

To identify the actual value of the entropy difference between two states, it is necessary to 
identify reversible processes in both directions. For phenomenological thermodynamics, such pro- 
cess are only known in the limiting case of quasistatic processes on states in thermal equilibrium. 
Although the temperature T that appears in the summation is strictly the temperature of the heat 
bath, the change in S can usually only be uniquely identified when the system is kept in thermal 
equilibrium at the same T as the heat bath (see [Fcr37j [Chapter IV, Section 11]). This can lead to 
the claim that thermodynamic entropy is only well-defined for systems in thermal equilibrium. 

While this may, arguably, be true for phenomenological thermodynamics, there is no reason to 
insist that it should also be true for statistical mechanics. Statistical mechanics comes with a well 
defined notion of processes - Hamiltonian evolution - even when systems are not in equilibrium. In 
statistical mechanics, there is no need to artificially restrict the domain of validity of the entropy 
function. 
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8 Conclusions 



We have come to the conclusion that the Gibbs-von Neumann entropy is the appropriate statisti- 
cal mechanical generalisation for thermodynamic entropy. This conclusion is reached based upon 
three considerations: that the dynamics of the system are Hamiltonian; that a probabilistic de- 
scription is meaningful; and that thermal states are physically represented by canonical probability 
distributions. 

No assumptions were required regarding whether thermal systems are subsystems of a large, 
microcanonically distributed system. Consequently, no assumptions regarding ergodicity or mixing 
are required. No assumptions regarding the size of the systems are involved, so no conclusions 
depend upon, or only hold true in, the thermodynamic limit. We have not assumed that the proba- 
bility distribution only applies to microscopic degrees of freedom. Should probability distributions 
over macroscopically distinct states [Pen70j arise, the arguments still hold. Of particular impor- 
tance, no restriction is made in its applicability to thermal systems or systems in equilibrium. The 
arguments that identify the Gibbs-von Neumann entropy for thermal systems, apply universally. 

Beyond the use of a probabilistic description itself, no assumptions were made regarding entropy 
having a relationship to knowledge or information. It is quite unnecessary to consider information 
theory or properties of Shannon information. No relationship between thermal states and maximal 
ignorance need be assumed. Describing the Gibbs-von Neumann entropy as information theoretic 
seems unjustified, if not downright anachronistic!^! 

The physical understanding of the Gibbs-von Neumann entropy is shown to be precisely the 
generalisation one should expect, to statistical mechanics, of the thermodynamic entropy. The 
generalisation is from 

There is no process which takes a system from state A, to a system at state B, depositing 
heats Q m into thermal systems at temperatures T m , for which 

S[B]-S[A}+J2^<0 

to 

There is no process which takes a system from state A, to a system at state B, depositing 
mean heats Q m into thermal systems at temperatures T m , for which 

S[B}-S[A]+J2% IL <0 (192) 

m ±m 

The generalisation involved recognises that, with some probability, all of the classical statements 
of the second law of thermodynamics, are violated to any degree. The restriction expressed by 
the entropy function is not of the minimal heat generation for a Hamiltonian evolution of a given 
microstate, but of what is the minimum expectation value of the heat generated by Hamiltonian 
flows. 

There remain many open questions in the understanding of statistical mechanics [Uff06 . We 
would like to develop how statistical mechanics may account for them, but this paper is far too long 
already. Of particular importance is the exploration of how the fine grained Gibbs-von Neumann 

14 The Gibbs entropy appears in 1902 [Gib02] and the generalisation to quantum theory in 1932 Ncu32 . The equiv- 
alent term does not appear in information theory until 1948 Sha48 and is not generalised to quantum theory until 
1994 [JS94] . 



(191) 
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entropy accounts for the appearance of irreversibility and time-asymmetry. On the question of 
whether this entropy is subjective, and 'observers' may affect the entropy of a system, in the 
manner of a Maxwellian Demon, see[Mar02]. There arc many other measures, microscopic and 
macroscopic, of probability distributions and of individual states, that are presented as 'entropies'. 
While they may have useful roles to play, the question is: are they the statistical mechanical 
generalisation of thermodynamic entropy? 

Once the canonical distribution is accepted as appropriate for thermal states, the Gibbs-von 
Neumann entropy follows inevitably for all probability distributions, microscopic or macroscopic. 
In Section [5] it is shown that the canonical distribution can be uniquely identified, solely from 
considering the observed properties of thermal states themselves. The relationship: 

E ^ > ( 193 ) 

for closed cycles, is then a derived property of Hamiltonian dynamics and serves to define the 
statistical mechanical generalisation of entropy, in the same way that the Clausius relationship 
defines phenomenological entropy. To quote a recent papeiP^I: 

The rule . . . that associates heat transfer with entropy holds only for thermodynamic 
entropy and, indeed, defines it. No other entropy can satisfy it without at once also 
being thermodynamic entropy. |Nor05j 
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